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ABSTRACT 

Thirteen graduate students were asked to indicate for 
each of 24 multiple-choice items whether the item tested "recall of 
specific information, 11 a "higher order skill, 11 or w don f t know." The 
students were also asked to state their general basis for judging the 
items. The 24 items had been previously classified according to 
Bloom f s cognitive-skills hierarchy. The results of the study 
supported the hypothesis that the examinees 1 judgment of the 
cognitive process being measured by each item is influenced by the 
structure of the item — for example, stem length, (Author) 
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This study is an outgrowth of a study of memory vs. inference processes 
fos seen in the report of multiple-choice item-solution processes). It examines 
the hypotheses with which a student may approach a test item, whether or not 
(s)he may know the answer. 

Observation of the classroom testing situation suggests that students 
often have hypotheses about the processes being elicited by a multiple-choice 
item (e.g. whether it measures "recall" or is a "thought" question), which 
presumably enters into the student's strategy for either answering or guessing* 
Theoretical considerations, too, suggest that since an achievement test can 
be viewed as an experiment (with instruction as the independent variable, and 
the score as the dependent variable) the concept of "demand characteristics" 
might .be applied: to the situation. (This term is applied from field of social 
psychology where it refers to characteristics of an experiment other than the 
treatment which affects the subjects responses). 

o5 ^^ In the case of multiple-choice items, students are known to try to 

"psych-out the professor." If a student doesn't know the answer to an item 
5«» ' S s!l>?5 (s)he can attempt to deduce it using strategies that fall under the rubric of 
S2? SsSsili 'test-wiseness." In so doing the student may well hypothesize whether the item 
ZtlVM^llo is testing recall of specific material or some "higher order" cognitive skill. 
S;l5 s 5g^2|o Despite current emphasis on so-called criterion-referenced tests, the criterion 
Ml2jE£«5s!§ behavior is typically never observed. Rather, approximations of that behavior 
tp |Sgf *Sz are elicited through a number of techniques including multiple-choice tests. 
S* J 8oS-o£! An understanding of the role of hypotheses about the intended process of an^ 

^IfsSg item may aid in the elimination of disfunctional "demands" thus presenting 
a means of increasing the criterion validity of such test scores. 
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The purpose of the present study was to determine whether students can 
reliably categorize items as intending to test recall or higher order skills 
based simply on the structure of the items. In addition information regarding 
the cues employed for this classification were sought. 

Method 

Students from Ithaca College and the University of Pennsylvania were asked 
to read two sets of twenty- four multiple-choice items; one set from a module 
dealing with glaciers, the other concerning the periodic table of atomic 
numbers (Diamond & Williams, 1972). The students read only the items and not 
the associated reading passages. Each was asked to indicate whether, in his/her 
Judgment, the item required the recall of specific information or some higher- 
order skill. The students were then asked to write down their basis for 
judging two types of items. 
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* Paper presented at meetings of the National Council for Measurement 

in Education, 1975. This research was supported by a grant from the 

Spencer Foundation. 
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The sets of 2k items had been originally vritten and judged to reflect 
Bloom 1 s (1956) cognitive-skills hierarchy of Knowledge, Comprehension, 
Application, and Analysis (by Kropp et, al . 1956). Six items of each type 
vere randomly arranged in booklet form, one item per page. 

A correct classification of an item was defined as at least majority 
agreement with the Judges 1 previous classification. ("Recall" corresponded 
with Knowledge and "higher-order skill" with any of the other three categories.) 
Internal consistency measures of the ratings were calculated by assigning a 
"score" of 1.0 if the student's item classification was the same as that of 
the judges, and 0.0 otherwise, "items" was treated as a repeated- factor and 
the mean-square for subjects x items as an estimate of error variance. 

Results 

The data for glaciers and atomic structure were very similar, and so are 
considered jointly. 

Number of Correct Classifications/Item 







Glaciers Items 


Atomic 


Structure Items 




Recall 


H.O.S. 


Recall 


1I.0.S. 


Penn 


5 


11 


6 


* 

12 


Ithaca 


5 


11 


£ 


10 


Internal 
Consistency 


.52 


.79 


.65 


.75 



Students were clearly able to correctly classify the Recall items. The 
lowest proportion of correct classification was 16/26 for one of the glacier 
items; all other proportions were higher. 

The written comments indicate that the structure of the item was the basis 
for their judgments. 



For example: 

" "It usually depended on the wording of the question. Certain 
key words helped me to decide— * relationship, 1 'approximately, 1 
♦plausible reason, 1 'assumption, 1 'best estimate,* 'might be 1 — 
which were HOS questions. The R questions seemed to be more direct 
with direct answers." 
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ff R items seemed to be the ones you could answer immediately 
from having memorized them. HOS items you usually knew a few facts 
from memorization hut were asked to use all this knowledge end come 
up with the best answer. More time is needed to answer this type. 11 

ff, Can best be described 1 questions are HOS because the student 
must organize the information from the passage into a logical answer. 
Best answer s best prediction = HOS." 

Discussion 

Accurate classification^ Knowledge, Application, and Analysis items 
was obtained. In fact, only 7 of the 36 items in these areas were incorrectly 
dichotomized; whereas 9 of the 12 Comprehension items were so rated. This 
finding adds credence to the structure hypothesis in that the comprehension 
items that were not of the "can best be described 11 variety are structurally 
indistinguishable from recall items. For example, consider the following item: 

The serious study of glaciers began about the time of the: 

A Civil War. 

B Golden Age of Greece. 

C .French Revolution. 

D discovery of America. 

The passage on which this item is based says that "Glaciers have been 
studied seriously for a little more than 100 years." Since the students in 
this stu(fcr did not read the passages, the item above can clearly appear to 
be testing recall of specific information. As indicated, however, there was 
one item on glaciers presumably testing recall of specific information in which 
part of the stem used the word "best." This item was presumed by the students 
to be testing some higher-order skill when in fact the test constructor intended 
it to measure recall only. Thus, inducing an appropriate "process demand" or 
"set" may be an important, but neglected, aspect of iter: construction. 

The results of this preliminary study should be viewed in the context 
of the "extra cognitive" aspects of multiple-choice testing, that is, the 
situational and structural factors — affecting the percent passing an item 
and yet not clearly related to the processes the item is intended to measure. 

The results of this study seen important in criterion-referenced measurement. 
If the purpose of such a test is to yield a score which can indicate the 
behaviors in which students have engaged, then, the process expectancy that 
the item's structure creates may be a source of systematic measurement error 
affecting the validity of inferences made from these scores. Systematic 
understanding of the relation between choice behavior influenced by expectancy 
(via structure) on one- hand and cognitive processes (via content) on the other 
appears essential for the construction of adequate criterion-referenced items 
capable of distingusihing between inference and recall. 
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